Matrix multiplication

In mathematics, matrix multiplication is a binary operation that takes a pair of matrices, and produces another matrix. If A is an n-by-m matrix and B is an m-by-p matrix, the result AB of their multiplication is an n-by-p matrix defined only if the number of columns m of the left matrix A is the equal to the number of rows of the right matrix B.

Contents

Matrix multiplication

Non-technical details

The result of matrix multiplication is a matrix whose elements are found by multiplying the elements within a row from the first matrix by the associated elements within a column from the second matrix and summing the products.

The procedure for finding an element of the resultant matrix is to multiply the first element of a given row from the first matrix times the first element of a given column from the second matrix, then add to that the product of the second element of the same row from the first matrix and the second element of the same column from the second matrix, then add the product of the third elements and so on, until the last element of that row from the first matrix is multiplied by the last element of that column from the second matrix and added to the sum of the other products.

Alternative explanation

Matrix multiplication is a way to combine two matrices and get a third matrix. One may compute each entry in the third matrix one at a time. The i,j entry in the third matrix is the sum of the products of the elements in the ith row in the first matrix and the jth column in second matrix. Suppose the ith row equals [ai1,ai2,...,ain] and the jth column equals [b1j,b2j,...,bnj]. Then the i,i entry of the third matrix is ai1b1j + ai2b2j + ... + ainbni.

Non-technical example

if A=\begin{bmatrix}a&b\\c&d\end{bmatrix} and B=\begin{bmatrix}e&f\\g&h\end{bmatrix} then [1]AB=\begin{bmatrix}a&b\\c&d\end{bmatrix}\begin{bmatrix}e&f\\g&h\end{bmatrix}=\begin{bmatrix}ae%2Bbg&af%2Bbh\\ce%2Bdg&cf%2Bdh\end{bmatrix}

Illustration

The figure to the right illustrates the product of two matrices A and B, showing how each intersection in the product matrix corresponds to a row of A and a column of B. The size of the output matrix is always the largest possible, i.e. for each row of A and for each column of B there are always corresponding intersections in the product matrix. The product matrix AB consists of all combinations of dot products of rows of A and columns of B.

The values at the intersections marked with circles are:

\begin{align}
{\color{Red}x_{1,2}} &= (a_{1,1}, a_{1,2})\cdot(b_{1,2}, b_{2,2}) \\
 &= a_{1, 1}b_{1,2} %2B a_{1,2}b_{2, 2} \\
{\color{Blue}x_{3,3}} &= (a_{3,1}, a_{3, 2})\cdot(b_{1, 3}, b_{2, 3}) \\
 &= a_{3, 1}b_{1,3} %2B a_{3,2}b_{2, 3}.
\end{align}

In general, the matrix product is non-commutative. For example,


  \overset{3\times 4 \text{ matrix}}{\begin{bmatrix}
     \cdot & \cdot & \cdot & \cdot \\
     \cdot & \cdot & \cdot & \cdot \\
     \color{Blue} 1 & \color{Blue} 2 & \color{Blue} 3 & \color{Blue} 4 \\
  \end{bmatrix}}
  \overset{4\times 5\text{ matrix}}{\begin{bmatrix}
    \cdot & \cdot & \cdot & \color{Red}a & \cdot \\
    \cdot & \cdot & \cdot & \color{Red}b & \cdot \\
    \cdot & \cdot & \cdot & \color{Red}c & \cdot \\
    \cdot & \cdot & \cdot & \color{Red}d & \cdot \\
  \end{bmatrix}}
=
\overset{3\times 5\text{ matrix}}{
\begin{bmatrix}
\cdot & \cdot & \cdot & \cdot & \cdot \\
\cdot & \cdot & \cdot & \cdot & \cdot \\
\cdot & \cdot & \cdot & x_{3,4} & \cdot \\
\end{bmatrix}}.

The element x_{3,4} of the above matrix product is computed as follows

x_{3,4} =
({\color{Blue}1}, {\color{Blue}2}, {\color{Blue}3}, {\color{Blue}4})\cdot
({\color{Red}a}, {\color{Red}b}, {\color{Red}c}, {\color{Red}d})
= {\color{Blue} 1}\times{\color{Red} a}
%2B{\color{Blue} 2}\times{\color{Red} b}
%2B{\color{Blue} 3}\times{\color{Red} c}
%2B{\color{Blue} 4}\times{\color{Red} d}.

The first coordinate in matrix notation denotes the row and the second the column; this order is used both in indexing and in giving the dimensions. The element x_{{\color{Blue}i}{\color{Red}j}} at the intersection of row i and column j of the product matrix is the dot product (or scalar product) of row i of the first matrix and column j of the second matrix. This explains why the width and the height of the matrices being multiplied must match: otherwise the dot product is not defined.

Application example

A company sells cement, chalk and plaster in bags weighing 25, 10, and 5 kg respectively. Four construction firms Arcen, Build, Construct and Demolish, buy these products from this company. The number of bags the clients buy in a specific year may be arranged in a 4×3-matrix A, with columns for the products and rows representing the clients:

A=\begin{bmatrix}14&9&3\\2&11&15\\0&12&17\\5&2&3\end{bmatrix}.

We see for instance that A_{3, 2}=12, indicating that client Construct has bought 12 bags of chalk that year.

A bag of cement costs $12, a bag of chalk $9 and a bag of plaster $8. The 3×2-matrix B shows prices and weights of the three products:

B=\begin{bmatrix}12&25\\9&10\\8&5\end{bmatrix}.

To find the total amount firm Arcen has spent that year, we calculate:

14\times 12 %2B 9\times 9 %2B 3\times 8 =273,

in which we recognize the first row of the matrix A (Arcen) and the first column of the matrix B (prices).

The total weight of the product bought by Arcen is calculated in a similar manner:

14\times 25 %2B 9\times 10 %2B 3\times 5 =455,

in which we now recognize the first row of the matrix A (Arcen) and the second column of the matrix B (weight).

We can make similar calculations for the other clients. Together they form the matrix AB as the matrix product of the matrices A and B:

AB=\begin{bmatrix}14&9&3\\2&11&15\\0&12&17\\5&2&3\end{bmatrix}
\begin{bmatrix}12&25\\9&10\\8&5\end{bmatrix}=
\begin{bmatrix}273&455\\243&235\\244&205\\102&160\end{bmatrix}.

Technical details

The matrix product is the most commonly used type of product of matrices. Matrices offer a concise way of representing linear transformations between vector spaces, and matrix multiplication corresponds to the composition of linear transformations. The matrix product of two matrices can be defined when their entries belong to the same ring, and hence can be added and multiplied, and, additionally, the number of the columns of the first matrix matches the number of the rows of the second matrix. The product of an m×p matrix A with a p×n matrix B is an m×n matrix denoted AB whose entries are

 (AB)_{i,j} = \sum_{k=1}^p A_{ik}B_{kj},

where 1 ≤ im is the row index and 1 ≤ jn is the column index. This definition can be restated by postulating that the matrix product is left and right distributive and the matrix units are multiplied according to the following rule:

\ E_{ik} E_{lj} = \delta_{kl} E_{ij},

where the first factor is the m×n matrix with 1 at the intersection of the ith row and the kth column and zeros elsewhere and the second factor is the p×n matrix with 1 at the intersection of the lth row and the jth column and zeros elsewhere.

Properties of matrix multiplication

In general, matrix multiplication is not commutative. More precisely, AB and BA need not be simultaneously defined; if they are, they may have different dimensions; and even if A and B are square matrices of the same order n, so that AB and BA are also square matrices of order n, if n is greater or equal than 2, AB need not be equal to BA. For example,

 E_{11}E_{12}=E_{12}, whereas E_{12}E_{11}=0.

However, if A and B are both diagonal square matrices, and of the same order, then AB = BA.

Matrix multiplication is associative:

\ {A} ( {B C} ) = ( {A B} ) {C}.

Matrix multiplication is distributive over matrix addition:

\ A ({B} %2B {C} ) = {A B} %2B {AC}, \quad
({A} %2B {B} ){C} = {A C} %2B {B C},

provided that the expression in either side of each identity is defined.

Matrix multiplication is compatible with scalar multiplication:

\  c(AB) = (cA)B = A(cB)

where c is a scalar (for the second identity to hold, c must belong to the center of the ground ring — this condition is automatically satisfied if the ground ring is commutative, in particular, for matrices over a field).

If A and B are both nxn matrices with entries in a field then the determinant of their product is the product of their determinants:

\;\!\det(AB) = \det(A)\det(B).

In particular, the determinants of AB and BA coincide.

Let U, V, and W be vector spaces over the same field with certain bases, S: VW and T: UV be linear transformations and ST: UW be their composition. Suppose that A, B, and C are the matrices of T, S, and ST with respect to the given bases. Then

 AB=C.\,

Thus the matrix of the composition (or the product) of linear transformations is the product of their matrices with respect to the given bases.

Product of several matrices

Matrix multiplication can be extended to the case of several matrices, provided that their dimensions match, and it is associative (i.e. the result does not depend on the way the factors are grouped together) as long as their order is not changed. If A, B, C, and D are, respectively, m×p, p×q, q×r, and r×n matrices, then there are 5 ways of grouping them without changing their order, and

\ ((AB)C)D=(A(BC))D=A((BC)D)=A(B(CD))=(AB)(CD)

is an m×n matrix denoted ABCD.

Alternative descriptions

The Euclidean inner product and outer product are the simplest special cases of the matrix product. The inner product of two column vectors A and B is

A\cdot B = A^TB,

where T denotes the matrix transpose.

More explicitly,

A\cdot B = A^TB =
\begin{bmatrix}a_1 & a_2 & \cdots & a_n\end{bmatrix}
\begin{bmatrix}b_1 \\ b_2 \\ \vdots \\ b_n\end{bmatrix}
= \begin{bmatrix} a_1b_1%2Ba_2b_2%2B\cdots%2Ba_nb_n  \end{bmatrix}.

The outer product is A\otimes B = AB^T, where

AB^T =
\begin{bmatrix}a_1 \\ a_2 \\ \vdots \\ a_n\end{bmatrix}
\begin{bmatrix}b_1 & b_2 & \cdots & b_n\end{bmatrix}
= \begin{bmatrix}
a_1 b_1 & a_1 b_2 & \cdots & a_1 b_n \\
a_2 b_1 & a_2 b_2 & \cdots & a_2 b_n \\
\vdots & \vdots & \ddots & \vdots \\
a_n b_1 & a_n b_2 & \cdots & a_n b_n \\
\end{bmatrix}.

Matrix multiplication can be viewed in terms of these two operations by considering the effect of the matrix product on block matrices:

Suppose that the first factor, A, is decomposed into its rows, which are row vectors and the second factor, B, is decomposed into its columns, which are column vectors:

{A} =
\begin{bmatrix}
{\color{Red} a_{1,1}} & {\color{Red}a_{1, 2}} & \cdots & {\color{Red} a_{1, n}} \\
{\color{ForestGreen} a_{2,1}} & {\color{ForestGreen} a_{2, 2}} & \cdots &
{\color{ForestGreen} a_{2, n}} \\
\vdots & \vdots & \ddots & \vdots \\
{\color{Blue} a_{m, 1}} & {\color{Blue} a_{m, 2}} & \cdots &
{\color{Blue} a_{m, n}}
\end{bmatrix}
= 
\begin{bmatrix}
{\color{Red} A_1} \\ {\color{ForestGreen} A_2} \\ \vdots \\ {\color{Blue} A_m}
\end{bmatrix}
{B} =
\begin{bmatrix}
{\color{Red}b_{1,1}} & {\color{ForestGreen}b_{1, 2}} & \cdots & {\color{Blue}b_{1, p}} \\
{\color{Red}b_{2,1}} & {\color{ForestGreen}b_{2, 2}} & \cdots & {\color{Blue}b_{2, p}} \\
\vdots & \vdots & \ddots & \vdots \\
{\color{Red}b_{n, 1}} & {\color{ForestGreen}b_{n, 2}} & \cdots & {\color{Blue}b_{n, p}}
\end{bmatrix}
= 
\begin{bmatrix}
{\color{Red} B_1} & {\color{ForestGreen} B_2} & \cdots & {\color{Blue} B_p}
\end{bmatrix}

where

A_i = \begin{bmatrix}a_{i, 1} & a_{i, 2} & \cdots & a_{i, n} \end{bmatrix}
B_i = \begin{bmatrix}b_{1, i} & b_{2, i} & \cdots & b_{n, i}\end{bmatrix}^T.

The method in the introduction was:


{AB} = 
\begin{bmatrix}
   A_1 \\
   A_2 \\
   \vdots \\
   A_m
\end{bmatrix}
\begin{bmatrix} B_1 & B_2 & \dots & B_p
\end{bmatrix}
= \begin{bmatrix}
(A_1 \cdot B_1) & (A_1 \cdot B_2) & \dots & (A_1 \cdot B_p) \\
(A_2 \cdot B_1) & (A_2 \cdot B_2) & \dots & (A_2 \cdot B_p) \\
\vdots & \vdots & \ddots & \vdots \\
(A_m \cdot B_1) & (A_m \cdot B_2) & \dots & (A_m \cdot B_p)
\end{bmatrix}.

This is an outer product where the product inside is replaced with the inner product. In general, block matrix multiplication works exactly like ordinary matrix multiplication, but the real product inside is replaced with the matrix product.

An alternative method results when the decomposition is done the other way around, i.e. the first factor, A, is decomposed into column vectors and the second factor, B, is decomposed into row vectors:


{AB} =
\begin{bmatrix} A_1 & A_2 & \cdots & A_n \end{bmatrix}
\begin{bmatrix} B_1 \\ B_2 \\ \vdots \\ B_n\end{bmatrix}
= A_1 \otimes B_1 %2B A_2 \otimes B_2 %2B \cdots %2B A_n \otimes B_n.

This method emphasizes the effect of individual column/row pairs on the result, which is a useful point of view with e.g. covariance matrices, where each such pair corresponds to the effect of a single sample point. An example for a small matrix:


\begin{bmatrix}
{\color{Red}1} & {\color{ForestGreen}2} & {\color{Blue}3} \\
{\color{Red}4} & {\color{ForestGreen}5} & {\color{Blue}6} \\
{\color{Red}7} & {\color{ForestGreen}8} & {\color{Blue}9} \\
\end{bmatrix}
\begin{bmatrix}
{\color{Red}a} & {\color{Red}d} \\
{\color{ForestGreen}b} & {\color{ForestGreen}e} \\
{\color{Blue}c} & {\color{Blue}f} \\
\end{bmatrix}
=
\begin{bmatrix}
{\color{Red}1a} & {\color{Red}1d} \\
{\color{Red}4a} & {\color{Red}4d} \\
{\color{Red}7a} & {\color{Red}7d} \\
\end{bmatrix}%2B
\begin{bmatrix}
{\color{ForestGreen}2b} & {\color{ForestGreen}2e} \\
{\color{ForestGreen}5b} & {\color{ForestGreen}5e} \\
{\color{ForestGreen}8b} & {\color{ForestGreen}8e} \\
\end{bmatrix}%2B
\begin{bmatrix}
{\color{Blue}3c} & {\color{Blue}3f} \\
{\color{Blue}6c} & {\color{Blue}6f} \\
{\color{Blue}9c} & {\color{Blue}9f} \\
\end{bmatrix}.

One more description of the matrix product may be obtained in the case when the second factor, B, is decomposed into the columns and the first factor, A, is viewed as a whole. Then A acts on the columns of B. If x is a vector and A is decomposed into columns, then

{A}x = A_1x_1%2BA_2x_2%2B\cdots%2BA_nx_n.

Algorithms for efficient matrix multiplication

Unsolved problems in computer science
What is the fastest algorithm for matrix multiplication?

The running time of square matrix multiplication, if carried out naïvely, is O( n^3 ). The running time for multiplying rectangular matrices (one m×p-matrix with one p×n-matrix) is O(mnp), however, more efficient algorithms exist, such as Strassen's algorithm, devised by Volker Strassen in 1969 and often referred to as "fast matrix multiplication". It is based on a way of multiplying two 2×2-matrices which requires only 7 multiplications (instead of the usual 8), at the expense of several additional addition and subtraction operations. Applying this recursively gives an algorithm with a multiplicative cost of O( n^{\log_{2}7}) \approx O(n^{2.807}). Strassen's algorithm is more complex compared to the naïve algorithm, and it lacks numerical stability. Nevertheless, it appears in several libraries, such as BLAS, where it is significantly more efficient for matrices with dimensions n > 100,[2] and is very useful for large matrices over exact domains such as finite fields, where numerical stability is not an issue.

The current O( n^k ) algorithm with the lowest known exponent k is a generalization of the Coppersmith–Winograd algorithm that has an asymptotic complexity of O(n2.3727) due to Vassilevska Williams.[3] This algorithm, and the Coppersmith-Winograd algorithm on which it is based, are similar to Strassen's algorithm: a way is devised for multiplying two k×k-matrices with fewer than k3 multiplications, and this technique is applied recursively. However, the constant coefficient hidden by the Big O notation is so large that these algorithms are only worthwhile for matrices that are too large to handle on present-day computers.[4]

Since any algorithm for multiplying two n×n-matrices has to process all 2×n2-entries, there is an asymptotic lower bound of \Omega (n^2) operations. Raz (2002) proves a lower bound of \Omega(n^2 \log n) for bounded coefficient arithmetic circuits over the real or complex numbers.

Cohn et al. (2003, 2005) put methods such as the Strassen and Coppersmith–Winograd algorithms in an entirely different group-theoretic context. They show that if families of wreath products of Abelian groups with symmetric groups satisfying certain conditions exist, then there are matrix multiplication algorithms with essentially quadratic complexity. Most researchers believe that this is indeed the case[5] – a lengthy attempt at proving this was undertaken by the late Jim Eve.[6] However, Alon, Shpilka and Umans have recently shown that some of these conjectures implying fast matrix multiplication are incompatible with another plausible conjecture, the sunflower conjecture.[7]

Because of the nature of matrix operations and the layout of matrices in memory, it is typically possible to gain substantial performance gains through use of parallelization and vectorization. It should therefore be noted that some lower time-complexity algorithms on paper may have indirect time complexity costs on real machines.

Scalar multiplication

The scalar multiplication of a matrix A = (aij) and a scalar r gives a product r A of the same size as A. The entries of r A are given by

 (r\mathbf{A})_{ij} = r \cdot a_{ij}. \,

For example, if

\mathbf{A}=\begin{bmatrix} a & b \\ c & d \end{bmatrix}

then

 r \cdot \mathbf{A}=\begin{bmatrix} r \cdot a & r \cdot b \\ r \cdot c & r \cdot d \end{bmatrix}.

If we are concerned with matrices over a more general ring, then the above multiplication is the left multiplication of the matrix A with scalar r while the right multiplication is defined to be

 (\mathbf{A}r)_{ij} = a_{ij} \cdot r. \,

When the underlying ring is commutative, for example, the real or complex number field, the two multiplications are the same. However, if the ring is not commutative, such as the quaternions, they may be different. For example


  i\begin{bmatrix} 
    i & 0 \\ 
    0 & j \\ 
  \end{bmatrix}
= \begin{bmatrix}
    -1 & 0 \\
     0 & k \\
  \end{bmatrix}
\ne \begin{bmatrix}
    -1 & 0 \\
    0 & -k \\
  \end{bmatrix}
= \begin{bmatrix}
    i & 0 \\
    0 & j \\
  \end{bmatrix}i.

Hadamard product

For two matrices of the same dimensions, we have the Hadamard product (named after French mathematician Jacques Hadamard), also known as the "entrywise product" and the "Schur product".[8]

Formally, for two matrices of the same dimensions:

A, B \in {\mathbb R}^{m \times n}

the Hadamard product A ○ B is a matrix of the same dimensions

 A \circ B \in {\mathbb R}^{m \times n}

with elements given by

 (A \circ B)_{i,j} = (A)_{i,j} \cdot (B)_{i,j}.

The Hadamard product is commutative, associative and distributive over addition, and is a principal submatrix of the Kronecker product. It appears in lossy compression algorithms such as JPEG.

Frobenius inner product

The Frobenius inner product, sometimes denoted A:B is the component-wise inner product of two matrices as though they are vectors. In other words, it is the sum of the entries of the Hadamard product, that is,

\mathbf{A}:\mathbf{B}=\sum_i\sum_j A_{ij} B_{ij} = \operatorname{trace}(\mathbf{A}^T \mathbf{B}) = \operatorname{trace}(\mathbf{A} \mathbf{B}^T).

This inner product induces the Frobenius norm.

Powers of matrices

Square matrices can be multiplied by themselves repeatedly in the same way that ordinary numbers can. This repeated multiplication can be described as a power of the matrix. Using the ordinary notion of matrix multiplication, the following identities hold for an n-by-n matrix A, a positive integer k, and a scalar c:

\mathbf{A}^0 = \mathbf{I}
\mathbf{A}^k = \prod_{i=1}^k \mathbf{A}
\ ( c\mathbf{A} )^k = c^k\mathbf{A}^k
\ \det (A^k) = \det(A)^k.

The naive computation of matrix powers is to multiply k times the matrix A to the result, starting with the identity matrix just like the scalar case. This can be improved using the binary representation of k, a method commonly used for scalars. An even better method is to use the eigenvalue decomposition of A.

Calculating high powers of matrices can be very time-consuming, but the complexity of the calculation can be dramatically decreased by using the Cayley–Hamilton theorem, which takes advantage of an identity found using the matrices' characteristic polynomial and gives a much more effective equation for Ak, which instead raises a scalar to the required power, rather than a matrix.

Powers of diagonal matrices

The kth power A^k of a diagonal matrix A , is the diagonal matrix whose diagonal entries are the kth powers of the corresponding entries of the original matrix A. The same is true for any analytic function of a diagonal matrix.


  A^k = \begin{bmatrix} 
    a_{11} & 0 & 0 & \cdots & 0 \\ 
    0 & a_{22} & 0 & \cdots & 0 \\ 
    0 & 0 & a_{33} & \cdots & 0 \\ 
    \vdots & \vdots & \vdots & \ddots & \vdots \\ 
    0 & 0 & 0 & \cdots & a_{nn}
  \end{bmatrix}^k =
\begin{bmatrix} 
    a_{11}^k & 0 & 0 & \cdots & 0 \\ 
    0 & a_{22}^k & 0 & \cdots & 0 \\ 
    0 & 0 & a_{33}^k & \cdots & 0 \\ 
    \vdots & \vdots & \vdots & \ddots & \vdots \\ 
    0 & 0 & 0 & \cdots & a_{nn}^k
  \end{bmatrix}

When raising an arbitrary matrix (not necessarily a diagonal matrix) to a power, it is often helpful to diagonalize the matrix first.

See also

Notes

  1. ^ Mary L. Boas, "Mathematical Methods in the Physical Sciences", Third Addition, Wiley, 2006, page 115
  2. ^ Press 2007, p. 108.
  3. ^ Virginia Vassilevska Williams. "Breaking the Coppersmith-Winograd barrier". http://www.cs.berkeley.edu/~virgi/matrixmult.pdf.  The original algorithm was presented by Don Coppersmith and Shmuel Winograd in 1990, has an asymptotic complexity of O(n2.376).
  4. ^ Robinson, Sara (2005), "Toward an Optimal Algorithm for Matrix Multiplication", SIAM News 38 (9), http://www.siam.org/pdf/news/174.pdf 
  5. ^ Robinson, 2005.
  6. ^ Eve, 2009.
  7. ^ Alon, Shpilka, Umans, On Sunflowers and Matrix Multiplication
  8. ^ (Horn & Johnson 1985, Ch. 5)

References

External links